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Introduction 


To build an dynamic index which considers both 

data and query distribution in account. 


Traditional Data Structures 
B-Tree, HashMaps 
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Previous Static Model [11 


Hierarchical NN Model 

Use Data distribution to predict position. 

Position = CDF(Key) * Total No. of Data 
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□ Training Mechanism 
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Input train Data 


Initialize 
Stage_No = 0 



Train Models of Stage_No with 
Corresponding Train Data 


Stage_No = Stage_No +1 
For each key of Models: Next_Stage_Model_no 
Model(Key) * (# of models in next stage) 
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End 



□ Searching Mechanism: 
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Input Data Key 




Initialize 
Stage_No = 0 
Model no =0 
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3 Foundation of Our Model 


□ Overcome limitations of previous model 

1. Build a Dynamic Hierarchical Model with 
variable Depth of Stages. 

2. Remove constraint of gradual increase of 
number of models with the increase of 
depth. 

□ Take Query Distribution in account as well 


4 Motivation 


□ No consideration of Query Distribution till Now 

□ Need faster data retrieval for Most frequent 

data 


5 Our Approach 



□ Searching Mechanism: 

Different from previous Model in Two Case 

1) Checking Condition - Is Leaf Model? Instead of 
is Last Stage? 

2) Equation for Next Stage Model No: 


Model no = 


Pos *(# of models in next stage) 
Total Number of Data in that Stage 










6 Experiments 


□ Experimented 
distribution: 


with Four different data 


0 Linear 
0 Exponential 


0 Normal 
0 Log-Normal 


□ Performance Matrices : 

> Model Error 

> Model Build Time 

> Search Time 

> Total Time 


7 Results 


□ Results For Gaussian Data Distribution: 
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□ Results are similar for three other distributions 


8 Findings 


□ Model Error, Model Build Time, Total Time are 
significantly less then previous model 

□ Search Time slightly greater than previous model 
for large dataset 

□ Real Life Application : 


^ Share Market Data 
S Traffic Data 



Data Distribution 
changes rapidly 
and model needs 
to be built 
frequently 


9 Future Work 


□ Build a Data and Query Distribution Driven 
Model 

□ Experiment it with different types to data and 
query distribution 

□ Expend the model for Multi-Dimensional Data 
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